*This version created June 4, 2018
*final update/annotation June 24, 2021
*PURPOSE: merge the spending survey data (monthly nondurables spending and related variables) with the expectations survey data (inflation expectations, unemployment expectations, etc)
*brief description of merge method: as a first choice, the "current" expectations are those reported in the first 10 days of the spending month; if none such are available we choose expectations
*dated up to 5 weeks prior to the spending month; if none such are available the spending observation is considered "unmatched" and is dropped
*lagged expectations are also created, and in such a way as to avoid overlap between "current" and "lagged" expectations  

 
clear
clear matrix
set more off
set scheme s1color
estimates clear
graph drop _all
set matsize 2500
log close _all

* Set Directory
cd "../Do"

** Locals

#delimit ;
local spendcats "
electricity water heatingfuel phonecable housecleaningproducts housecleaningservice
gardenproducts gardenservice clothing personalcare drugs healthcareservices medsupplies entertainment
hobbies personalservices otherchildspending foodhome foodout gasoline";


local spendcats_long 
"pricefridge priceoven pricedwasher pricewasher pricetv pricecomputer furniture";

#delimit cr

local expectations "d_inflmedian d_inflquant1 d_inflquant2 q7 q6 q2a q40 d_wagemedian d_wagequant1 d_wagequant2 d_longinflquant2 d_longinflquant1 d_longinflmedian"

local housing "mortgage rent"

local bought "boughtcar boughtfridge boughtoven boughtwasher boughtdwasher boughttv boughtcomputer"

local q_extra "homeins proptax carins carmaint healthins trips homerepair homerepair_services gifts charity downpmt_car amtstocks"

*loads spending module data (from RAND-ALP)
use ../Data/cons_panel2.dta

*drop month year 

sort prim_key date

*this month refers to month in which spending survey was completed
gen month2 = mofd(date)
rename month2 surv_month
format surv_month %tm

/* this shows that after november 2011, durables are collected each month.
sort month
collapse (sum) `spendcats_long' , by(month)
browse month `spendcats_long'
ak
*/

sort prim_key surv_month


*MB: the income variable below is monthly and not reliable; annual household income (reliable) created later; keeping this code just to avoid nuisance errors later
egen faminc = rowtotal(income spouseincome) , missing

*Create month in which spending took place by lagging the survey-completion month by 1, so we can match the expectations timing to the spending period; expectations are as of the date reported, not backward-looking or recalled
*this is done because the consumption took place in the month (or perhaps in the entire quarter, in the case of durables) before the survey was taken--they are asked to report spending in the previous month or quarter
gen spend_month = surv_month-1
format spend_month %tm

*below is first day of the spending month (in case of nondurables)
gen spend_month_firstday = dofm(spend_month)
format spend_month_firstday %td
*this identifies the spending quarter, based on the first day of the last month of the spending quarter referenced in the survey (not the only way we might have done this, but it gives the correct quarter)
gen quarter = qofd(spend_month_firstday)
format quarter %tq

*NOTE below pertains to durable goods spending measures which are here in the background but not the object of merging in this do file--can ignore 
* here the quarter is generated off of the first day of the third month in the spending quarter. because
* the question asks for past 3 months of spending, the lag of 1 month (relative to survey completion) ensures
* that the durable spending variables are assigned to the quarter 
* that they were spent. Warning: don't specify durables by month. Only by quarter.

sort prim_key spend_month
*this drops second (or third) observation for nondurables spending if two are observed for same person within same spending month: this drops just 14 observations 
duplicates tag prim_key spend_month, generate(duplicates)
by prim_key spend_month: gen count = _n
drop if count !=1
drop count
drop duplicates



tempfile nondurables
save `nondurables'
count
*above command counts number of obs--it equals roughly 83,000 at this point

*following dataset contains just the raw expectations data (not also spending) 
use ../Data/complete_panel_alldates.dta, clear

sort unique survey 
*MB this merges on household ID and survey wave (but diff people complete on diff dates): we are merging in the derived wage density data
merge 1:1 unique survey using ../Data/wagedensity_data_long.dta , gen(wage_merge)
tab wage_merge

*same:  merging the derived density data on one-year-ahead inflation expectations
sort unique survey 
merge 1:1 unique survey using ../Data/infl_density_data_long.dta, gen(infl_merge)
tab infl_merge

* merging the derived density data on long-run inflation expectations
sort unique survey
merge 1:1 unique survey using ../Data/longinfl_density_data_long.dta, gen(longinfl_merge)
tab longinfl_merge

* this is where significant numbers of observations are lost.

drop if d_inflmedian==. | d_wagemedian==. | d_longinflmedian==.

*generating real wage expectation as nominal wage density median minus inflation density medianm 
gen rw_expect = d_wagemedian - d_inflmedian
*generating nominal wage growth uncertainty as interquartile range of nominal wage expectation distribution
gen d_wageiqr = d_wagequant2 - d_wagequant1
*generating inflation uncertainty as interquartile range of one-year-ahead inflation expectation distribution 
gen d_infliqr = d_inflquant2 - d_inflquant1
*long-run inflation uncertainty (analogous to above)
gen d_longinfliqr = d_longinflquant2 - d_longinflquant1


drop month year 
*new variable "exp_month" refers to calendar month on day that expectations survey was completed (previously it was just a number between 1 and 12) 
gen exp_month = mofd(date)
format exp_month %tm
*exp_quarter refers to calendar quarter in which expectations survey was completed
gen exp_quarter = qofd(date)
format exp_quarter %tq

*in the expectations survey, date refers to survey completion, same as expectations date; but exp_date is just numeric, not formatted legibly (not sure why this gen is needed) 
gen exp_date = date

**
preserve
*generating "spend_month" variable for merging with spending data--expectations from days 1-10 of a month are assigned spend_month=month; otherwise assigned spend_month=month+1
*that is "current" expectations were formed either in first 10 days of spending month or in days 11-31 of previous month--if not availabe will allow expectations from days 1-10 of previous month
*should give priority to the expectations formed in first 10 days of the month over those dated late in the previous month if both are available--this achieved just below and only affects 12 cases
sort prim_key exp_month
gen spend_month=.
format spend_month %tm
replace spend_month=exp_month if day<=10
replace spend_month=exp_month+1 if day>10
rename day exp_day

bysort prim_key spend_month: egen maxdate=max(date)
format maxdate %td 
*keeping only the latest-dated expectations survey completed within a spending-month group: 
*these will tend to be those formed early on in the same month as the spending, rather than late in the prior month--only drops 6 observations total
keep if date==maxdate
drop if maxdate==.
drop maxdate
drop if spend_month==.
*xtset on household and spending month
xtset unique spend_month
*generate lagged inflation expectations: most recent lag, up to 12 months prior to current spending month
gen lag_IE=L1.d_inflmedian 
forvalues i=2/12 {
replace lag_IE=L`i'.d_inflmedian if lag_IE==.
}
*generate lagged inflation uncertainty: same as above 
gen lag_infl_iqr=L1.d_infliqr 
forvalues i=2/12 {
replace lag_infl_iqr=L`i'.d_infliqr if lag_infl_iqr==.
}
*results in 521 missing values for lagged IE or lagged uncertainty (same number missing for both)

*this is the concurrent monthly expectations (M1):
tempfile monthly_exp_curr
save `monthly_exp_curr'

restore
**

*generating second-choice "current" expectations (expectations from days 1-10 of previous month---may be assigned as "current" if no expectations available for days 11-31 of previous or first 10 days of spending month)
*need to assign lagged expectations separately in this case to avoid overlap 
sort prim_key exp_month
gen spend_month=.
format spend_month %tm
*expectations from days 1-10 of a given month are assigned to following spending month--will only be allowed if first merge yields no match for the spending month; means that some expectations may be used as "current" for two consecutive months
replace spend_month=exp_month+1 if day<=10
*above generates many missing values for spend_month--that's OK
*generating an alternate spending month for purposes of generating lagged IE in a consistent way for the "second-choice" expectations merging
gen spend_month2=exp_month+1 
format spend_month2 %tm
*keeping only the latest-dated expectations survey completed within a spending-month window: drops 12 observations total 
bysort prim_key spend_month2: egen maxdate=max(date)
format maxdate %td 
keep if date==maxdate
*maxdate is missing for 4 observations 
drop if maxdate==.
drop maxdate
xtset unique spend_month2

*generate lagged inflation expectations: most recent lag, up to 12 months prior to current spending month (defined so that there will be no overlap between current and lagged)
gen lag_IE=L1.d_inflmedian 
forvalues i=2/12 {
replace lag_IE=L`i'.d_inflmedian if lag_IE==.
}
*generate lagged inflation uncertainty: same as above 
gen lag_infl_iqr=L1.d_infliqr 
forvalues i=2/12 {
replace lag_infl_iqr=L`i'.d_infliqr if lag_infl_iqr==.
}
drop if spend_month==.
rename day exp_day

tempfile monthly_exp_curr_extra
save `monthly_exp_curr_extra'
*may need to drop quarter and recreate it to match the spending month ;  not if not using durables/quarterly data
*save data/monthly_exp_curr_extra.dta, replace

use `nondurables', clear
egen unique_temp = group(prim_key)
xtset unique_temp spend_month
/*gen l1car = L1.car
gen l2car = L2.car
gen l3car = L3.car
*this is the sum of recent car
egen rec_car = rowtotal(l1car  l2car  l3car)
replace rec_car = . if rec_car == 0 & spend_month<594
*this is a dummy for recent car spending
gen recent_car = 1 if rec_car>0 & rec_car!=.
replace recent_car = 0 if rec_car==0
*/

*now merging nondurables spending with expectations (including lagged IE) assigned to the spending month: either those from days 1-10 of the spending month (if avail) or from days 11-31 of previous month
merge 1:1 prim_key spend_month using `monthly_exp_curr', gen(merge_status)
preserve
*sum rage, d
** keeps only if there is a time-match between spending month in expectations data and spending month in spending data
keep if merge_status==3
drop merge_status
*sum rage, d

sort prim_key spend_month
tempfile monthly_matches_first
save `monthly_matches_first'

restore
*goes to point just after merge
*keep things that did not have a match in the expectations data set
keep if merge_status==1
drop merge_status

*now merging expectations from days 1-10 of previous month, only for observations not previously matched
*merge 1:1 prim_key spend_month using `monthly_exp_curr_extra', gen(merge_status) update
*as a robustness check (because this generates overlap/duplication in expectations across adjacent months in some cases) I can just skip this step, but this means fewer observations
*I still need to understand better why i need to use the "update replace" option: it's because the merge was already done and some variables came in non-missing, but now i'm assigning spending values for a different month--need to overwrite with values from using, which are or the correct month (?)
merge 1:1 prim_key spend_month using `monthly_exp_curr_extra', gen(merge_status) update replace
keep if merge_status>2 & merge_status!=. 
drop merge_status
append using `monthly_matches_first'
sort prim_key spend_month
drop if prim_key==""
egen unique_id=group(prim_key)
xtset unique_id spend_month


*note that in the merged dataset the "tsend" date refers to completion of the spending survey--expectations survey date information retained as "exp_date" etc. This renders matching more transparent in data
save ../Data/matched_data_nondurables_jun2018_baseline.dta, replace
// save ../Data/matched_data_nondurables_jun2021_baseline.dta, replace

